55 research outputs found

    Cure the headache of Transformers via Collinear Constrained Attention

    Full text link
    As the rapid progression of practical applications based on Large Language Models continues, the importance of extrapolating performance has grown exponentially in the research domain. In our study, we identified an anomalous behavior in Transformer models that had been previously overlooked, leading to a chaos around closest tokens which carried the most important information. We've coined this discovery the "headache of Transformers". To address this at its core, we introduced a novel self-attention structure named Collinear Constrained Attention (CoCA). This structure can be seamlessly integrated with existing extrapolation, interpolation methods, and other optimization strategies designed for traditional Transformer models. We have achieved excellent extrapolating performance even for 16 times to 24 times of sequence lengths during inference without any fine-tuning on our model. We have also enhanced CoCA's computational and spatial efficiency to ensure its practicality. We plan to open-source CoCA shortly. In the meantime, we've made our code available in the appendix for reappearing experiments.Comment: 16 pages, 6 figure

    Once is Enough: A Light-Weight Cross-Attention for Fast Sentence Pair Modeling

    Full text link
    Transformer-based models have achieved great success on sentence pair modeling tasks, such as answer selection and natural language inference (NLI). These models generally perform cross-attention over input pairs, leading to prohibitive computational costs. Recent studies propose dual-encoder and late interaction architectures for faster computation. However, the balance between the expressive of cross-attention and computation speedup still needs better coordinated. To this end, this paper introduces a novel paradigm MixEncoder for efficient sentence pair modeling. MixEncoder involves a light-weight cross-attention mechanism. It conducts query encoding only once while modeling the query-candidate interaction in parallel. Extensive experiments conducted on four tasks demonstrate that our MixEncoder can speed up sentence pairing by over 113x while achieving comparable performance as the more expensive cross-attention models.Comment: Accepted to EMNLP 202

    Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit

    Full text link
    Adapting Deep Learning (DL) techniques to automate non-trivial coding activities, such as code documentation and defect detection, has been intensively studied recently. Learning to predict code changes is one of the popular and essential investigations. Prior studies have shown that DL techniques such as Neural Machine Translation (NMT) can benefit meaningful code changes, including bug fixing and code refactoring. However, NMT models may encounter bottleneck when modeling long sequences, thus are limited in accurately predicting code changes. In this work, we design a Transformer-based approach, considering that Transformer has proven effective in capturing long-term dependencies. Specifically, we propose a novel model named DTrans. For better incorporating the local structure of code, i.e., statement-level information in this paper, DTrans is designed with dynamically relative position encoding in the multi-head attention of Transformer. Experiments on benchmark datasets demonstrate that DTrans can more accurately generate patches than the state-of-the-art methods, increasing the performance by at least 5.45\%-46.57\% in terms of the exact match metric on different datasets. Moreover, DTrans can locate the lines to change with 1.75\%-24.21\% higher accuracy than the existing methods

    Precursors and Pathways Leading to Enhanced Secondary Organic Aerosol Formation during Severe Haze Episodes

    Get PDF
    Publisher Copyright: © 2021 American Chemical SocietyMolecular analyses help to investigate the key precursors and chemical processes of secondary organic aerosol (SOA) formation. We obtained the sources and molecular compositions of organic aerosol in PM2.5in winter in Beijing by online and offline mass spectrometer measurements. Photochemical and aqueous processing were both involved in producing SOA during the haze events. Aromatics, isoprene, long-chain alkanes or alkenes, and carbonyls such as glyoxal and methylglyoxal were all important precursors. The enhanced SOA formation during the severe haze event was predominantly contributed by aqueous processing that was promoted by elevated amounts of aerosol water for which multifunctional organic nitrates contributed the most followed by organic compounds having four oxygen atoms in their formulae. The latter included dicarboxylic acids and various oxidation products from isoprene and aromatics as well as products or oligomers from methylglyoxal aqueous uptake. Nitrated phenols, organosulfates, and methanesulfonic acid were also important SOA products but their contributions to the elevated SOA mass during the severe haze event were minor. Our results highlight the importance of reducing nitrogen oxides and nitrate for future SOA control. Additionally, the formation of highly oxygenated long-chain molecules with a low degree of unsaturation in polluted urban environments requires further research.Peer reviewe

    Dense Feature Aggregation and Pruning for RGBT Tracking

    Full text link
    How to perform effective information fusion of different modalities is a core factor in boosting the performance of RGBT tracking. This paper presents a novel deep fusion algorithm based on the representations from an end-to-end trained convolutional neural network. To deploy the complementarity of features of all layers, we propose a recursive strategy to densely aggregate these features that yield robust representations of target objects in each modality. In different modalities, we propose to prune the densely aggregated features of all modalities in a collaborative way. In a specific, we employ the operations of global average pooling and weighted random selection to perform channel scoring and selection, which could remove redundant and noisy features to achieve more robust feature representation. Experimental results on two RGBT tracking benchmark datasets suggest that our tracker achieves clear state-of-the-art against other RGB and RGBT tracking methods.Comment: arXiv admin note: text overlap with arXiv:1811.0985

    Life-Cycle-Based Multicriteria Sustainability Evaluation of Industrial Parks: A Case Study in China

    Get PDF
    Along with increasing concerns on environmental protection and global warming mitigation, new industrial organization modes such as “Ecoindustrial Park” and “Low Carbon Industrial Park” are emerging. Since ecoindustrial parks and low carbon industrial parks may offer multifaceted benefits to the users, it naturally follows that the sustainability assessment of the industrial parks ought to adopt a multicriteria methodology. In this paper, a multicriteria sustainable evaluation framework is proposed in combination with the life cycle analysis and applied to a low carbon and high end industrial park (LCHE) in Beijing, China. Results show that the LCHE industrial park can contribute to both energy-saving and greenhouse gas emission mitigations compared with other industrial parks. In terms of economic performance, although the economic profits are considerable, the investment per constructed area is relatively high. The results of sustainable analysis of the LCHE industrial park can thus shed light on future upgrading of industrial parks

    SYNONYMOUS CONDON USAGE BIAS AND OVEREXPRESSION OF A SYNTHETIC xynB GENE FROM Aspergillus niger NL-1 IN Pichia pastoris

    No full text
    To further improve the expression level of recombinant xylanase in Pichia pastoris, the xynB gene, encoding the mature peptide from Aspergillus niger NL-1, was designed and synthesized based on the synonymous condon bias of P. pastoris and optimized G+C content. 155 nucleotides were changed, and the GC content decreased from 57.7% to 43.6%. The synthetic xynB was inserted into the pPICZaA and then integrated into P. pastoris GS115. The activity of the recombinant xylanase reached 1414.7 U/mL, induced with 0.8% methanol after 14-day cultivation at a temperature of 28oC in shake flasks, which was 267% higher than that of the native gene. Furthermore, the maximum xylanase activity of 20424.2 U/mL was obtained by high-density fermentation in a 5-L fermenter, which was the highest xylanase expression in P. pastoris yet reported. The recombinant xylanase had its optimal activity at a pH of 5.0 and temperature of 50oC. The recombinant xylanase was stable over a pH range of 4.5 to 8.0. Thus, this report provides an industrial means to produce the recombinant xylanase in P. pastoris

    Study on the Relationship between Early Shrinkage Cracking and Mechanical Properties of Nano-Clay Cement Mortar Based on Fractal Theory

    No full text
    In order to study the influence of nano-clay on the crack resistance of cement-based materials, two kinds of nano-metakaolin (NMK) and two kinds of nano-attapulgite clay (NMA) were considered. The early cracking process and mechanical properties of nano-clay cement mortar (NCM) was studied by using a plate knife-edge constraint test. Based on fractal theory, the distribution characteristics of NCM surface cracks were revealed, and the calculation method forNCM maximum crack width was given. The results show that the cracking time of the NMK-3 specimen is 2 and 6 h later than that of NMK-1 and NMA-2, respectively; the smaller the particle size of nano-clay, the earlier the cracking time of the specimen. However, nano-clay effectively inhibited the expansion of mortar cracks, and the cracks on the surface of NCM were thin and sparse. At 28 days, the maximum crack width of NMK-3 was 46.7% and 33.3% lower than that of NMK-1 and NMA-2, respectively. NMK hadthe best improvement effect on the mechanical properties cement mortar. The smaller the particle size, the more pronounced the improvement effect.The flexural strength ratio and compressive strength ratio at 7 and 28 days are 76.7%, 67.4%, and 61.2%, respectively.The distribution of surface cracks on NCM has fractal characteristics, and the fractal dimension of surface cracks is smaller than that of ordinary cement mortar. The larger the particle size of nano-clay, the smaller the fractal dimension of cracks. The quantitative relationship between fracture fractal dimension and NCM elastic modulus and shrinkage tensile stress is established
    corecore